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Compressed sensing is the art of reconstructing a sparse vector from its inner products with respect to a 
small set of randomly chosen measurement vectors. It is usually assumed that the ensemble of measurement 
vectors is in isotropic position in the sense that the associated covariance matrix is proportional to the identity 
matrix. In this paper, we establish bounds on the number of required measurements in the anisotropic case, 
where the ensemble of measurement vectors possesses a non-trivial covariance matrix. Essentially, we find that 
the required sampling rate grows proportionally to the condition number of the covariance matrix. In contrast 
to other recent contributions to this problem, our arguments do not rely on any restricted isometry properties 
(RIP's), but rather on ideas from convex geometry which have been systematically studied in the theory of low- 
rank matrix recovery. This allows for a simple argument and slightly improved bounds, but may lead to a worse 
dependency on noise (which we do not consider in the present paper). 



L INTRODUCTION AND RESULTS 

Compressed sensing is a highly actively research field in 
statistics and signal analysis QjJ— 14£] . It can be thought of as 
being concerned with establishing Nyquist-type sampling the- 
orems for signals which are sparse, rather than band-limited. 

More precisely, let x 6 (D n be a vector with no more than 
s non-zero entries (i.e. x is s-sparse). Suppose we have no 
information about x apart from its sparsity and the inner prod- 
ucts (fli,x),i — 1, ... m between x and m <C n vectors a*. 
The central question is: under what conditions on m and the 
<Zj's is it possible to uniquely and computationally efficiently 
recovers? Early celebrated results [1-3] established e.g. that 
if the measurement vectors {a^} are randomly chosen discrete 
Fourier vectors and rn = 0(s log n), then, with high proba- 
bility, the unknown vector x is the unique minimizer of the 
^i-norm in the affine space defined by the known inner prod- 
ucts. 

The precise statement of our results in this introductory sec- 
tion will follow very closely the exhibition in [5]. The reason 
for this approach, and the relation of the present paper with 
other work (in particular [6]), is stated in Section HT1 

We make the following definitions: Let F by an ensem- 
ble of random vectors on <D n . Let oi, . . . , a m be a sequence 
of i.i.d. random vectors drawn from F. Define the sampling 
matrix 

^ m 
A := T Ve,n*. 

V 2 — 1 

Once more, let x be an s-sparse vector. We aim to prove that 
with high probability the solution x* to the convex optimiza- 
tion problem 

min II^Hj subjectto Ax — Ax, (1) 

x£C n 



is unique and equal to x given that the number of measure- 
ments m is large enough. 

It turns out that the required size of m depends only on 
two simple properties of the ensemble F. These are identified 
below: 

Completeness We require that the ensemble F is complete in 
the sense that the covariance matrix S = E[aa*] 1//2 is 
invertible. The condition number 1 of £ will be denoted 
by k. 

Most of the previous work has focused on the case where 
the covariance matrix is proportional to the identity matrix 
X oc 1 (however, see Section |TTJ. We refer to this case as the 
isotropic one. 

In order to describe the second relevant property of the en- 
semble, we have to fix a scale. Indeed, note that the minimizer 
of the convex problem ([TJ is invariant under re-scaling of the 
ensemble (i.e. substituting dj by vcti for a number v ^ 0). The 
same is true for the condition number n. Thus, we are free to 
pick an advantageous scale, without affecting the notions in- 
troduced so far. In the isotropic case, a natural normalization 
convention [5] consists in requiring that E[aa*] = 1. This 
option is not available in the more general, anisotropic case, 
we are interested in here. Instead, we will implicitly demand 
from now that 

A max (E[aa*]) = A min (E[aa*])-\ (2) 

where A max , A m ; n denote the maximal and the minimal eigen- 
value respectively. In the isotropic case, this reduces to the 
normalization E[aa*] = 1 used in J5|]. 

The fact that (O can always be achieved (and further prop- 
erties that follow from it) will be established in Lemma |8]be- 
low. 

With this convention, we define: 
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Recall that the condition number of a matrix is the ratio between its largest 
and its smallest singular value. 
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Incoherence The incoherence parameter is the smallest num- 
ber /I such that 



max 

Ki<n 



max \(a, E[aa*l 1 e,-\| 



< M, 

< A 4 



(3) 
(4) 



holds almost surely. 
The previously known isotropic result we aim to generalize 

is: 

Theorem 1 ([5]). Let x be an s-sparse vector in E™. If we 
demand isotropy (M[aa*\ = t) and if the number of measure- 
ments fulfills 

m > C u ns logn, 

then the solution x* of the convex program (O is unique and 
equal to x with probability at least 1 — — — e~ u . 

In the statement above, C u may be chosen as Cq (1 + co) 
for some positive numerical constant Cq 

Our main theorem reads: 

Theorem 2 (Main Theorem). Let x £ (D™ be an s-sparse 
vector, let u> > 1. If the number of measurements fulfills 

m > Coj 2 k/j,s log n, 

then the solution x* of the convex program ([7]) is unique and 
equal to x with probability at least 1 — e - ". 

In the statement above, C is a constant less than 18044. For 
n, s sufficiently large, the value may be improved to C < 228. 
We have made no attempts to optimize these constants. 

Comparing these two theorems, we see that the effect of 
dropping the isotropy constraint on the ensemble can essen- 
tially be captured in a single, simple quantity: the condition 
number k of the covariance matrix. All other minor differ- 
ences between Theorem 1 and Theorem 2 result from slightly 
different proof techniques. 



A. Improvements 

A first way of improving the result is based on a definition 
borrowed from [6, Def. 1.2] 2 : 

Definition 3. The largest and smallest s-sparse eigenvalue of 
a matrix X are given by 



A m ax(s, X) 



A m in(s,^) := min 



\\Xvh 
max — . 

v,\\v\\ <s \\v\\2 

u Xv\\ 2 



v,\\v\\ a <s ||v||2 



(5) 
(6) 



The s-sparse condition number 3 of X is 



cond(s, X) 



A m i n (s, X) 



Based on this notion, one can state a strictly stronger ver- 
sion of the Main Theorem (which is the form we will prove in 
SectionlHlll: 

Theorem 4. With 

k s := max |cond(s, X), cond(s, , 

the conclusion of the main Theorem\2\continues to hold if the 
lower bound on m is weakened to 

m > C/i k s u) 2 s logn, 

for the same constant C. 

We further suspect that the second incoherence condition 
(0 can be relaxed. Two alternative bounds not relying on 
that condition are stated in Proposition [5] below. (The mod- 
ifications of our proof necessary to arrive at these improved 
estimates will be sketched after Lemma |9). 

Proposition 5. Let K be a constant such that 

2\\\aa* 1 Maa*}- 1 ]\\ <K 
11 L 7 L 1 J 11 00 — 

holds almost surely. 

If the requirement (0 is not necessarily fulfilled, the con- 
clusions of Theorem 2 remain valid if the sampling rate is 
bounded below by either 



m > C/iKUJ 2 s 2 logn 



m > C(s/iK + K)oj 2 logn. 



(7) 



(8) 



The commutator bound (O is particularly relevant for en- 
sembles corresponding to non-uniform samples from an or- 
thogonal basis. In that case, E[aa*] and aa* commute with 
probability one, so that K may be chosen to be zero. 

There is another degree of freedom which we have not yet 
systematically explored: Note that the minimizer of the con- 
vex optimization (Q]i does not change if we re-scale individ- 
ual vectors a,{ i-> v%ai for some set of numbers 14. While 
we have chosen a global scale for the covariance matrix (c.f. 
Lemma[8]l, the individual weights remain free parameters that 
may be used to optimize the sampling rate. Pursuing this prob- 
lem further seems likely to be fruitful. 



2 In fact, our definition differs very slightly from |6|: their pmax(s,X) 
is the square of our A max (s, X). We opted for this change because the 
notions defined here reduce to the ordinary eigenvalues in the case of s = 
n. 



3 We are not aware of an efficient algorithm for estimating k b in general. A 
naive evaluation following the definition would require looking at all ( n ) 
possible supports of an s-sparse vector in <D n . An obvious approach would 
be to pass to a suitable convex relaxation in terms of f i-norm constraints, 
but we do not currently know how well this would work. 
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We remark that the incoherence conditions can be relaxed 
to hold only with high probability. This opens up our results 
to, for example, the case of Gaussian measurement vectors. 
The details can be developed in complete analogy to Ref. JJ, 
Appendix B]. 

Lastly, all statements remain true if the measurement vec- 
tors are drawn "without replacement" instead of indepen- 
dently - c.f. 17J for details. 



II. RELATION WITH PREVIOUS WORK AND HISTORY 

Most results on sparse vector recovery have relied on a cer- 
tain restricted isometry property (RIP) | l,i_3j,[6(] - a quantitative 
description of how much a given sampling matrix A distorts 
the Euclidean geometry of the set of sparse vectors. 

From roughly 2008 on, the conceptually strongly related 
problem of recovering a low-rank matrix from a few matrix 
elements has come more and more into focus H H|. RIP- 
based techniques are not applicable here, because there is no 
incoherence — in a sense analogous to the definition above — 
between the set of all low-rank matrices and the set of matri- 
ces with a single non-zero entry. The latter set would be the 
analogue of the measurement vectors considered here. (How- 
ever, c.f. II 1011 for interesting cases where RIP techniques are 
applicable to low -rank matrix recovery problems; and ifTTIl for 
a related "restricted strong convexity" property with conse- 
quences for matrix recovery). Instead, pioneering publica- 
tions on the matrix problem used fairly elaborate methods 
from convex duality theory. 

In lfl2i [l3"ll the second author and his collaborators intro- 
duced a simplified approach to the low-rank matrix recovery 
problem. New ideas included the use of non-commutative 
large deviation theorems originating from quantum informa- 
tion theory lfl4T,[l5ll . randomized constructions based on i.i.d. 
samples of the measurement vectors, and a certain iterative 
"golfing scheme" for the construction of dual certificates. 
These techniques were later modified and adapted to the orig- 
inal sparse- vector setting in [5]. This showed that the concep- 
tual closeness of the matrix and the vector theory may be used 
to devise very similar proofs. 

This "RIPless" approach to compressed sensing leads ar- 
guably to simpler proofs and gives tighter bounds at least for 
the noise-free recovery problem. As far as we know, RIP- 
based arguments still perform superior in the important noisy 
regime. 

The work [5] did not include a systematic study of 
non-isotropic ensembles (however, "small" deviations from 
isotropy were discussed in Appendix B). In fact, E. Candes 
JH] suggested to us the problem of finding a generalization of 
the golfing scheme that could cope with anisotropic ensem- 
bles. This has been achieved by the first author of this paper 
during a research project under the supervision of the second 
author [16]. This explains the close relation between |5|] and 
the present work. 

An analysis of anisotropic compressed sensing within the 
original RIP framework has been carried out by other authors, 
most notably in 16J . Since their paper does not directly address 



the noise-free case, a direct comparison of statements is diffi- 
cult. The closest result to ours seems to appear in Section 1.3, 
where a bound of 

m > 0(sM 2 \ogn log 3 (slogn)) 

for the sampling rate is given. The quantity M is an upper 
bound on the largest coefficient for the measurement vectors 
cii, related to our parameter /i. The big-Oh notation hides a 
constant proportional to k (p^ 1 in the language of [6]). Thus, 
the basic structure of the solutions is very similar. However, 
some important differences are these: 

• We do not incur the log 3 -term. 

• The result in J6J holds uniformly in the sense that with 
their probability of success, one obtains a sampling ma- 
trix which works simultaneously for all sparse vectors. 
This is not the case for us. 

• We have proved no results on noise-resilience. While, 
following [5], it should be straight-forward to do so, the 
results may be worse than the RIP -based ones in J6J. 

• The proof methods are completely different. 

III. PROOF 

The proof is conceptually close to JH, which in turn closely 
resembles 11711 . We have still opted to give a largely self- 
contained presentation. 

A. Notation 

Throughout this paper, we will use the following conven- 
tions: 

If a statements holds almost surely, we will abbreviate this 
by a.s. In the case of vectors, || • || p denotes the ^ p -norm, 
whereas in the operator case || • || p refers to the Schatten-p 
norm (i.e. the ^ p -norm of the singular values). The letter y 
will always denote a vector in (D n , supported on a set T of 
cardinality at most s (i.e. y is s-sparse). T c shall denote the 
complement of T, and Pr (Pt<= ) refers to the orthogonal pro- 
jector onto the set of all vectors supported on T (T c ). Finally 
we will use the following technical definitions: 

X = {^[aa*])- 1 = 5T 2 , 
Xt = PtXPt- 

B. Large deviation bounds 

A central role in the argument is played by certain large de- 
viation bounds for sums of matrix-valued random variables. 
These have been introduced in 01411 in the context of quan- 
tum information theory. The first application to matrix com- 
pletion and compressed sensing problems, as well as the first 
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"Bernstein version" taking variance information into account, 
appeared in lfl2l [l7tl . The version we will be making use of 
derives from Theorem 1.6 in Ill5ll. 



Proposition 6 (Matrix Bernstein inequality IU5I0 . Consider a 
finite sequence {M k } £ <C dxd of independent, random matri- 
ces. Assume that each random matrix satisfies IE [M k ] = 
and ll-MfeU < B a.s. and define 

a 2 := max J || (M k M* k ) W^, || £ E {M* k M k ) \\ x \ . 

{ k k ) 



Then for all t > 0, 



Pr Hj^AffcHoo >t \ < 2dexp (- 



t 2 /2 



a 2 + Bt/3 



(9) 



We will also require a vector-valued deviation estimate. 
While one could in principle obtain such a statement by ap- 
plying Proposition [6] to diagonal matrices, a direct argument 
does away with the dimension factor d on the r.h.s. of ((9). This 
will save a logarithmic factor in the sampling rate of the Main 
Theorem. The particular vector-valued Bernstein inequality 
below is based on the exposition in yjy , with a direct proof 
appearing in 11711 . 

Proposition 7 (Vector Bernstein inequality). Let {g k } € C d 
be a finite sequence of independent random vectors. Sup- 
pose that E [g k ] = and \\g k \\ 2 < B a.s. and put a" > 

T l k E \\\9k\\i\- Then for allO < t < a 2 /B: 



Pr 



E 



fjk 




C. Fundamental estimates 



We adopt the structure and nomenclature of this section 
from |Ht]. The following elementary bounds will be used re- 
peatedly: 



\(a k ,y)\ 2 < sMl 

|(a fe ,^)| 2 < spLhWl 

\\Pra k \\l < /is, 
\\P T Xa k \\ 2 < fis. 

Also, we will always assume that m > s. 



(10) 
(11) 
(12) 
(13) 



Lemma 8 (Scaling). Let a be a random vector such that 
E[aa*] is invertible. 

There is a number v such that, with a := va, it holds that 

k s = A max (s, E[aa*]) = A min (s, E[aa*]) _1 

for all 1 < s < n. This reseated ensemble fulfills: 

K s fi > 1. (14) 



Proof. The first assertion follows immediately for 

v = (A max (s,E[aa*])A min (s,E[aa*]))^. 

For the second claim: By definition [i > max^ | (a, e^) | 2 
holds almost surely, so that in particular 

[i > E max|(a,ei)| 2 . 

i 

For every i, the function 

a i y |(a,ei)| 2 
is convex, which implies that 

a i y max |(a, ei)\ 2 = maxe*(aa*)ei 

i i 

is convex (as the pointwise maximum of convex functions). 
Hence, by Jensen's inequality, 

E max \(a, ei)\ 2 > maxe*E[aa*]ei = max(ei, E[aa*]ei) 

i i i 

(1, E[aa*]) > A m i n (s, E[aa*]) . 

Therefore /i > A m ; n (s, E[aa*]). Together with k s = 
A~ m (s, E[aa*]), this implies /ik s > 1. □ 

The estimates in this proof are tight in the sense that there 
are ensembles for which each inequality above turns into an 
equality. A straightforward example for such an ensemble is 
given by picking super-normalized Fourier basis vectors f k 

) according to the uniform 



(with coefficients (fk)i 
probability distribution. 

Lemma 9 (Local isometry). Let T and Pt be as above. Then 
l . 

2' 

Pi (\\Pr{XA* A- l)Pr||oo > r) 



for each < r < -■ 



< 2s exp 



s/LiK s 2(l + 2r/3), 
Proof. Let us decompose the relevant expression: 

^ m 

P T (XA*A - 1) P T = ~y> fc , 

i=l 

where g k := P T (Xa k a% - 1) P T . Note that M[g k ] = 0. 

We aim to apply the Matrix Bernstein inequality. To this 
end, we estimate 

IMloo < \\PrXa k a%Prb + l 

= \\PTXa k \\ 2 \\a* k P T \\ 2 + l 
< /as + 1 < 2/isk s =: B. 

Furthermore: 

IIEMIL 

= ||E [(Pr (Xa k a* k - 1) P T ) (P T (a k a* k X - 1) Pt)]^ 
E [P T Xa k alP T a k alXPT] - E [P T Xa k alP T ] 
-E [P T a k alXP T ] + P T 

OO 

= ||E [P T (Xa k (a k ,P T a k ) a* k X - 1) P T ]\\ X 

< maxdl/iaElPrA-OfcaJXiVHL,!) 

< max (/is ||-X't|| 00 j 1) ^ max (/j,sk s , 1) = /isk s . 



5 



Similarly, 

{WML 

= ||E [P T ( 0fc (a k ,XP T Xa k ) a% - I) P T ] 

< maxdla/iEliVafca^lL.l) 

< max (s/z ||P T X _1 P T || oo , l) < /iSK s . 



and 



(15) 
(16) 



Thus: 



max h^E (M fc M fe *) |U || £ E (M fc *M,.) 



< 



ms[j,K, s 



Applying the Matrix Bernstein inequality for s-dimensional 
matrices (P T (XA*A - 1) P T has rank at most s) with t = 
mr yields the desired result. □ 

The estimate (fT~5l > is the only place in the proof where the 
second incoherence property (2) is essentially used. A careful 
analysis shows that in all other cases, one can do without it, 
possibly at the price of replacing n s by k (which is the reason 
why we have not spelled it out). In order to obtain the results 
of Proposition|5] the bound (Q3) has to be modified. To arrive 
at ©, use 

Pto]IL<E||to]|L 

< MWPrakalPria^XPrXakiWoo 

< sn~E{a k ,XP T Xa k ) = s/iEtr (a k a* k XP T X) 
= s/itr (X- 1 XP T X) = sfitr (P T X) < s 2 fin s . 

And for ©: 

WElPrakatXPrXakalPrlWn 
= \\E[P T Xa k alPTXa k a* k P T } 

+E [P T [a k a* k ,X}P T Xa k a* k P T ] ^ 

+2 ||E [P T [a k a* k , X]P T Xa k alP T ] ^ 
< ^/c a + Jr||E[PrA-o fc ojtPr]||oo 

= /iS/C s + X || P T XX~ l P T || = /iSK s +K. 

Lemma 10 (Low-distortion). Let y, T, Pr be as above. For 
each < r < 1 it holds that 



Pr(||i?r(l-A*AX)y|| 2 >r|M| a ) 
mr 2 1~ 
16sfiK s 4 . 



< exp 



Proof. The structure of the proof closely follows the one of 
Lemma [9] 
Set 



We bound 

hkh 



g k := Pt (1 - a k a* k X) y. 

= \\P T (1 - a k a* k X)y\\ 2 

< \\y\\ 2 + \\P T a k (a k ,Xy)\\ 2 

< \\vh + s^hh < 2s(iK s \\y\\ 2 



H\\9k\\l] < n\\PTa k (a k ,Xy)\\ 2 2 } + \\y\\ 



< 



M[\\P T a k \\ 2 2 \(a k ,Xy)\ 2 ]+\\y\\ 2 2 
sfi-E[(Xy,a k )(a k ,Xy}] + \\y\\ 2 
sn(Xy,-E[a k al]Xy)]+\\yf 2 
sfi(Xy,y) + \\y\\ 2 < 2s^K s \\y\\ 2 



so that 



^EOHI 2 .] < 2msfXK s \\y\\l =: a 2 

k=i 

and thus ^- = m||j/||2. The advertised statement follows by 
applying the vector Bernstein inequality for t = mr. □ 

Lemma 11 (Off-support incoherence). Let y, Ft? be as 

above. Then for each r > 0: 



Px(\\P r cA*AXy\\ 00 >T\\y\\ 2 ] 

/ ?>m.T 2 \ 

< 2n exp 



2fiK s (3 + v / sr) / 
Proof. Fix i 6 T c and use the following decomposition: 



(ei,A*AXy) = -Y2g k , 

711 ^ ' 



B 



where g k := {e h a k a k Xy) = (ei,a k }(a k , Xy). Note that we 
have: 

E[ 5fc ] = ( ei ,E[o fc aJ]A-y) = (e h y) = 0, 
because ej € T c . Bound: 

Igfcl = Kei,cifc)(afc,Xy)| < v^M^alblla =: -S, 

and: 

E[ 5fe ^] = Efe]=E[Ka fe!ei )| 2 Ka fe! Xy>| 2 ] 

< fjLE[(Xy,a k alXy)]= n(Xy,y) 

< »\\XT\U\y\\t < f^shWl 



Therefore we can set a 2 := mfiK s \\y\\ 2 . Applying the Matrix 
Bernstein inequality for d = 1 and the union bound over all 
i € T c yields the claim. □ 

Lemma 12 (Uniform off-support incoherence). Let T c , Pt be 

as above. For < t < 1 we have 



I \ I mr I 
Pr m&x\\P T XA*Aei\\ 2 > r < ncxp + - 

Proof. Fix i G T c and decompose: 

1 m 

m f— ; 

fe=l 
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where gy. :— (ak,ei)PTXak- It holds that M[gk] = 0. Next, 
bound 

||fffe||2 = |(afe,ei)|||-Pr^"afe||2 < sn =: B. 
Furthermore: 

E[||<7*||1] < Y.vH(^,XakaiXe t )} 

i<ET 

We can therefore set a 2 := msfiK s and apply the Vector Bern- 
stein inequality for t = mr. Noting that a 2 j B = mn s > rn 
finishes the proof. □ 

D. Convex geometry 

Our aim is to prove that the solution x* to the optimization 
problem (Q]i equals the unknown vector x. One way of assur- 
ing this is by exhibiting a dual certificate lfl9ll . This method 
was first introduced by [2] and is now standard. We will use a 
relaxed version first introduced in IU7II and later adapted from 
matrices to vectors in [5]. Our version further adapts the state- 
ment to the anisotropic setting. 

Lemma 13 (Inexact duality). Let x € (D™ be a s-sparse vec- 
tor, let T = supp (x). 
Assume that 

\\(P T XA*AP T y 1 \\ 00 < 2, (17) 
mnx ieT c\\P T XA*Ae l \\ 2 < 1 (18) 

and that there is a vector v in the row space of A obeying 

\\v T - sgn(x T ) h < ~ (19) 

Ko|U ^ \- ( 2 °) 

Then the solution x* of the convex program (0 is unique 
and equal to x. 

Proof. Let x — x + h be a solution of the minimization proce- 
dure. We note that feasibility requires Ah = 0. To prove the 
claim it suffices to show h = 0. Observe: 

Pill = INt + Mil + II^HIi 

= (sgn (x T + h T ) , x T + h T ) + \\h T c \\ l 

> (sgn(x T ) ,x T ) + (sgn(x T ) , h T ) + \\hT°\\i 

> \\x\\! - | (sgn (xr),h T ) | + \\h T "\\i 

Feasibility requires (v, h) — and therefore: 

| (sgn (x T ) , h T ) | = \(sgn(x T ) - v T ,h T ) + (v T ,h T }\ 

= | (sgn (x T ) -v T ,h T ) - (v T c,h T c) \ 

< |(sgn(x T ) -v T ,h T )\ + \(v T c,h T c}\ 

< ||sgn(af T ) - UTlbllftrlb + \{vr°,hr°)\ 

< 4IIMI2 + \{vT°,h T °)\, 



where we have used JT9l . Together with: 

Kt^.,MI<Ko|| 0O ||/ lr .|| 1 <i||/ lr .|| 1) 

this implies: 

|(sgn(x T ),MI < -(IIMb + IIMIi)- 

Furthermore due to (TTTt and ([18): 

||ft T || 2 = || (P T XA*AP T y 1 {P T XA*AP T )h T \\ 2 

= || (P T XA*AP T y 1 (P T XA*A) (h - h T c) || 2 

= || - (P T XA*AP T y 1 {P T XA*A)h T a\\ 2 

< 2\\P T XA*AP T ah\\ 2 

< 2ma,x ieT 4P T XA*Ae l \\ 2 \\hT4i 

< 2\\h T 4i, 

All this together implies: 

Plli > INk-^IIMl2 + |l|fer-l|i 
> Nli + ^ll^lli- 

Pill = Pill tnus demands ||/it<=||i = 0, which in turn im- 
plies ||/it||i = 0, because ||/it||2 < 2||/it<=||i- Therefore 
h = which corresponds to a unique minimizer (x = x). □ 

E. Construction of the certificate 

It remains to show that a dual certificate v as described in 
Lemma[T3lcan indeed be constructed. We will prove: 

Lemma 14. Let x £ (D™ be an s-sparse vector, let uj > 1. If 
the number of measurements fulfills 

m > 18044cj 2 K s /islogn, 

then with probability at least 1 — e~ u , the constraints ( |771li<SD 
will hold and a vector v with the properties given in Lemma \l~3\ 
exists. 

This lemma immediately implies the Main Theorem. 

The proof employs a recursive procedure (dubbed the "golf- 
ing scheme") to construct a sequence vi of vectors converging 
to a dual certificate with high probability. The technique has 
been developed in Il2l Il7ll in the context of low-rank matrix 
recovery problems and has later been refined for compressed 
sensing in J5[]. Here, we further modify the construction to 
handle anisotropic ensembles. 

Proof. The recursive scheme consists of I iterations. The ith 
iteration depends on three parameters: m, € N;ci, £j € R 
which will be chosen in the course of the later analysis. To 
initialize, set 

v =0 
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(the Vi for 1 < i < I will be defined iteratively below). We 
will use the notation 

q l = sgn(x T ) - PrVf 

The ith step of the scheme proceeds according to the fol- 
lowing protocol: We sample rrii vectors from the ensemble 
F. Let A be the rrii x n-matrix whose rows consists of these 
vectors. We check whether the following two conditions are 
met: 

/ fn ~ ~ \ 

\\P T i A*AX)P T q i _ 1 \\ 2 < CiH^i-iHa, (21) 

V rrii J 





II- 




rrii 


If so, set 




Ai 


= A 




m 


Vi 






rrii 



— A*A t XP T (sgn (x T ) - Vi-i) + Vi^! 



and proceed to step i + 1. If either of (1211 1. ( f22b fails to hold, 
repeat the ith step with a fresh batch of rrii vectors drawn from 
F. Denote the number of repetitions of the ith step by fj. 

We now analyze the properties of the above recursive con- 
struction. 

The following identities are easily verified: 



v :- 



vi = Y\ —A^AiXPrqi^, 
L — ' mt 



i=l 



(23) 



9< = Y[PT[l-—A*A J x)p T sgn(x T ). (24) 

3=1 



Together with (l2"TT i and (l22l . one obtains 

i i 
hih < Q\kl-ih < n 1 1 Qo 1 1 2 = J|ci||sgna;||2 

i=l i-l 

i 

i=l 



\Vt< 



i=l 

; 



i i-l 

< ^^n*-iii2<v^(^i+^^n c i 

i=2 j=l 



i=l 



Following [ 17], we choose the parameters I, a, ti as 



I = 



2- 



ci = c 2 
ti = ta 



2 Vlog n ' 
1 



and for z > 3 

logn 1 
*' = 8V£ ' °* = 2" 

A short calculation then yields 

\\vt4oc < |, ||u - sgn(x T )|| 2 = ||<7z|| 2 < |, 

which are conditions (fT~9T > and (|20T >. 

Next, we need to establish that the total number 



of sampled vectors remains small with high probability. More 
precisely, we will bound the probability 



(n > 1) or (r 2 > 1) or ( r t > 



P3 := Pr 



for some V to be chosen later. 

To that end, denote by pi (i) the probability that (f2Tb fails to 
hold in any given batch of the ith step. Analogously, let p 2 (i) 
be the probability of failure for (122V Lemmas [TOl and UTI give 
the estimates 



Pi 00 < exp 
P2 (*) < 2n exp 
We choose 



1 



16sfiK s 4 J 
3mii? 

2jJ,K s (3 + -x/s*i) 



J' = 4(w + logl2 + -0, 
mi = i«2 = 694s/iK s cj log u, 



and for i > 3 



m, = 694s^,k s w. 



Such a choice can be guaranteed by a total sampling rate m > 
18044s/iLLi 2 K s logn and ensures 

for all i. (It is easily seen that for for n 3> 1, a bound of 
to > 228s/ik s log n is sufficient. The constants appearing 
here are highly unlikely to be optimal.) Note that 

i=l 

only if fewer than I of the first V batches of vectors satisfied 
both (EH and (l22l . This implies that 



Pr 



< Pr(iV < I - 1) 



Bin(i',ii): 



8 



where the r.h.s. is the probability of obtaining fewer than I — 1 
outcomes in a binomial process with I' repetition and individ- 
ual success probability 11/12. We bound this quantity using 
a standard concentration bound from ll20ll : 



Pr (| Bin (n,p) — np\ > r) < 2exp 




This yields p^ < w for our choice of I . Putting things 
together, we have 

P»<3|e- = ie- 

In addition, we have to take into account that properties 
(fTTI i and (fT8l can fail as well. We denote these probabilities 
of failure by p^ and £15. Lemmas l9l and fT2l give: 

/ 6m \ 

Pi < 2sexp , 

V 7s^ik s J 

( m 1\ 

p 5 < ncxp + - . 

\ 8s/iK s 4 J 

Our sampling rate m guarantees P4 < \e~^ as well as 
ps < ^e~ u . Applying the union bound now yields our de- 
sired overall error bound (p^ + P4 + P5 < e~ u ). □ 

IV. CONCLUSION AND OUTLOOK 

In this paper, we have shown that proof techniques based on 
duality theory and the "golfing scheme" are versatile enough 



to handle the situation where the ensemble of measurement 
vectors is not isotropic. 

An obvious future line of research would be to translate 
these results to the low-rank matrix recovery problem. Given 
the high degree of similarity between lfl7ll and JD], this should 
be a conceptually straight-forward task. This would further 
generalize the scope of this proof method, beyond ortho- 
normal operator bases lll7ll and tight frames IBll . 

Also, Proposition [5] suggests that the second incoherence 
property dU can be relaxed or maybe even disposed of. We 
leave this as an open problem. 
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